Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition
نویسندگان
چکیده
An experimental study of the application of scale-transform to improve the performance of speaker independent continuous speech recognition, is presented in this paper. Three major results are described. First, a comparison was made between the scale-transform based magnitude cepstrum coeÆcients (STCC) and mel-scale lter bank cepstrum coeÆcients (MFCC) on a telephone based connected digit recognition task. It was shown that the STCC can obtain a performance that is close to that of the MFCC. Second, a simple frequency-normalization procedure was applied to the scale-transform representation that improved performance on the connected digit recognition task with respect to the MFCC. Finally, in a more controlled experimental setting using the TIMIT database, it was shown that the application of phone-speci c frequency warpings improved phone classi cation performance over using a single speaker-speci c warping. This last result may have general implications for all frequency warping based speaker normalization procedures.
منابع مشابه
Evaluation of the Parameters Involved in the Iris Recognition System
Biometric recognition is an automatic identification method which is based on unique features or characteristics possessed by human beings and Iris recognition has proved itself as one of the most reliable biometric methods available owing to the accuracy provided by its unique epigenetic patterns. The main steps in any iris recognition system are image acquisition, iris segmentation, iris norm...
متن کاملSpeaker normalization for automatic speech recognition - An on-line approach
We propose a method to transform the on line speech signal so as to comply with the specications of an HMM-based automatic speech recognizer. The spectrum of the input signal undergoes a vocal tract length (VTL) normalization based on dierences of the average third formant F3. The high frequency gap which is generated after scaling is estimated by means of an extrapolation scheme. Mel scale c...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملExploiting lower face symmetry in appearance-based automatic speechreading
Appearance-based visual speech feature extraction is being widely used in the automatic speechreading and audio-visual speech recognition literature. In its most common application, the discrete cosine transform (DCT) is utilized to compress the image of the speaker’s mouth region-of-interest (ROI), and the highest energy spatial frequency components are retained as visual features. Good genera...
متن کاملRobust recognition of children's speech
Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of children’s speech. Through an analysis of age-related acoustic characteristics of children’s speech in the context of automatic speech recognition (ASR), effects such as frequency sc...
متن کامل